Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 20
Filtrar
1.
Mycoses ; 67(1): e13667, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-37914666

RESUMO

BACKGROUND: Clinical severity scores, such as acute physiology, age, chronic health evaluation II (APACHE II), sequential organ failure assessment (SOFA), Pitt Bacteremia Score (PBS), and European Confederation of Medical Mycology Quality (EQUAL) score, may not reliably predict candidemia prognosis owing to their prespecified scorings that can limit their adaptability and applicability. OBJECTIVES: Unlike those fixed and prespecified scorings, we aim to develop and validate a machine learning (ML) approach that is able to learn predictive models adaptively from available patient data to increase adaptability and applicability. METHODS: Different ML algorithms follow different design philosophies and consequently, they carry different learning biases. We have designed an ensemble meta-learner based on stacked generalisation to integrate multiple learners as a team to work at its best in a synergy to improve predictive performances. RESULTS: In the multicenter retrospective study, we analysed 512 patients with candidemia from January 2014 to July 2019 and compared a stacked generalisation model (SGM) with APACHE II, SOFA, PBS and EQUAL score to predict the 14-day mortality. The cross-validation results showed that the SGM significantly outperformed APACHE II, SOFA, PBS, and EQUAL score across several metrics, including F1-score (0.68, p < .005), Matthews correlation coefficient (0.54, p < .05 vs. SOFA, p < .005 vs. the others) and the area under the curve (AUC; 0.87, p < .005). In addition, in an independent external test, the model effectively predicted patients' mortality in the external validation cohort, with an AUC of 0.77. CONCLUSIONS: ML models show potential for improving mortality prediction amongst patients with candidemia compared to clinical severity scores.


Assuntos
Bacteriemia , Candidemia , Humanos , Escores de Disfunção Orgânica , APACHE , Estudos Retrospectivos , Candidemia/diagnóstico , Estudos de Viabilidade , Prognóstico , Aprendizado de Máquina , Curva ROC , Unidades de Terapia Intensiva
2.
J Arthroplasty ; 37(1): 132-141, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-34543697

RESUMO

BACKGROUND: The criteria outlined in the International Consensus Meeting (ICM) in 2018, which were prespecified and fixed, have been commonly practiced by clinicians to diagnose periprosthetic joint infection (PJI). We developed a machine learning (ML) system for PJI diagnosis and compared it with the ICM scoring system to verify the feasibility of ML. METHODS: We designed an ensemble meta-learner, which combined 5 learning algorithms to achieve superior performance by optimizing their synergy. To increase the comprehensibility of ML, we developed an explanation generator that produces understandable explanations of individual predictions. We performed stratified 5-fold cross-validation on a cohort of 323 patients to compare the ML meta-learner with the ICM scoring system. RESULTS: Cross-validation demonstrated ML's superior predictive performance to that of the ICM scoring system for various metrics, including accuracy, precision, recall, F1 score, Matthews correlation coefficient, and area under receiver operating characteristic curve. Moreover, the case study showed that ML was capable of identifying personalized important features missing from ICM and providing interpretable decision support for individual diagnosis. CONCLUSION: Unlike ICM, ML could construct adaptive diagnostic models from the available patient data instead of making diagnoses based on prespecified criteria. The experimental results suggest that ML is feasible and competitive for PJI diagnosis compared with the current widely used ICM scoring criteria. The adaptive ML models can serve as an auxiliary system to ICM for diagnosing PJI.


Assuntos
Artrite Infecciosa , Infecções Relacionadas à Prótese , Humanos , Aprendizado de Máquina , Infecções Relacionadas à Prótese/diagnóstico , Curva ROC , Estudos Retrospectivos
3.
Int J Mol Sci ; 22(12)2021 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-34203772

RESUMO

Protein-protein interactions (PPIs) are the basis of most biological functions determined by residue-residue interactions (RRIs). Predicting residue pairs responsible for the interaction is crucial for understanding the cause of a disease and drug design. Computational approaches that considered inexpensive and faster solutions for RRI prediction have been widely used to predict protein interfaces for further analysis. This study presents RRI-Meta, an ensemble meta-learning-based method for RRI prediction. Its hierarchical learning structure comprises four base classifiers and one meta-classifier to integrate predictive strengths from different classifiers. It considers multiple feature types, including sequence-, structure-, and neighbor-based features, for characterizing other properties of a residue interaction environment to better distinguish between noninteracting and interacting residues. We conducted the same experiments using the same data as previously reported in the literature to demonstrate RRI-Meta's performance. Experimental results show that RRI-Meta is superior to several current prediction tools. Additionally, to analyze the factors that affect the performance of RRI-Meta, we conducted a comparative case study using different protein complexes.


Assuntos
Algoritmos , Aminoácidos/metabolismo , Biologia Computacional/métodos , Área Sob a Curva , Modelos Moleculares , Curva ROC
4.
Methods Mol Biol ; 2131: 375-397, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32162268

RESUMO

One of the major challenges in the field of vaccine design is identifying B-cell epitopes in continuously evolving viruses. Various tools have been developed to predict linear or conformational epitopes, each relying on different physicochemical properties and adopting distinct search strategies. In this chapter, we propose different ensemble meta-learning approaches for epitope prediction based on stacked, cascade generalizations, and meta decision trees. Through meta learning, we expect a meta learner to be able to integrate multiple prediction models and outperform the single best-performing model. The objective of this chapter is twofold: (1) to promote the complementary predictive strengths in different prediction tools and (2) to introduce computational models to exploit the synergy among various prediction tools. Our primary goal is not to develop any particular classifier for B-cell epitope prediction, but to advocate the feasibility of meta learning to epitope prediction. With the flexibility of meta learning, the researcher can construct various meta classification hierarchies that are applicable to epitope prediction in different protein domains.


Assuntos
Biologia Computacional/métodos , Epitopos de Linfócito B/química , Algoritmos , Árvores de Decisões , Desenho de Fármacos , Epitopos de Linfócito B/imunologia , Humanos , Memória Imunológica , Conformação Molecular
5.
BMC Bioinformatics ; 20(1): 308, 2019 Jun 10.
Artigo em Inglês | MEDLINE | ID: mdl-31182027

RESUMO

BACKGROUND: Although various machine learning-based predictors have been developed for estimating protein-protein interactions, their performances vary with dataset and species, and are affected by two primary aspects: choice of learning algorithm, and the representation of protein pairs. To improve the performance of predicting protein-protein interactions, we exploit the synergy of multiple learning algorithms, and utilize the expressiveness of different protein-pair features. RESULTS: We developed a stacked generalization scheme that integrates five learning algorithms. We also designed three types of protein-pair features based on the physicochemical properties of amino acids, gene ontology annotations, and interaction network topologies. When tested on 19 published datasets collected from eight species, the proposed approach achieved a significantly higher or comparable overall performance, compared with seven competitive predictors. CONCLUSION: We introduced an ensemble learning approach for PPI prediction that integrated multiple learning algorithms and different protein-pair representations. The extensive comparisons with other state-of-the-art prediction tools demonstrated the feasibility and superiority of the proposed method.


Assuntos
Algoritmos , Mapeamento de Interação de Proteínas/métodos , Animais , Área Sob a Curva , Bases de Dados de Proteínas , Ontologia Genética , Humanos , Anotação de Sequência Molecular
6.
IEEE J Biomed Health Inform ; 22(1): 265-275, 2018 01.
Artigo em Inglês | MEDLINE | ID: mdl-28212102

RESUMO

Several factors contribute to individual variability in postoperative pain, therefore, individuals consume postoperative analgesics at different rates. Although many statistical studies have analyzed postoperative pain and analgesic consumption, most have identified only the correlation and have not subjected the statistical model to further tests in order to evaluate its predictive accuracy. In this study involving 3052 patients, a multistrategy computational approach was developed for analgesic consumption prediction. This approach uses data on patient-controlled analgesia demand behavior over time and combines clustering, classification, and regression to mitigate the limitations of current statistical models. Cross-validation results indicated that the proposed approach significantly outperforms various existing regression methods. Moreover, a comparison between the predictions by anesthesiologists and medical specialists and those of the computational approach for an independent test data set of 60 patients further evidenced the superiority of the computational approach in predicting analgesic consumption because it produced markedly lower root mean squared errors.


Assuntos
Analgesia Controlada pelo Paciente/métodos , Analgesia Controlada pelo Paciente/estatística & dados numéricos , Analgésicos/administração & dosagem , Modelos Estatísticos , Adulto , Idoso , Análise por Conglomerados , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Reconhecimento Automatizado de Padrão , Análise de Regressão
7.
BMC Bioinformatics ; 15: 378, 2014 Nov 18.
Artigo em Inglês | MEDLINE | ID: mdl-25403375

RESUMO

BACKGROUND: One of the major challenges in the field of vaccine design is identifying B-cell epitopes in continuously evolving viruses. Various tools have been developed to predict linear or conformational epitopes, each relying on different physicochemical properties and adopting distinct search strategies. We propose a meta-learning approach for epitope prediction based on stacked and cascade generalizations. Through meta learning, we expect a meta learner to be able integrate multiple prediction models, and outperform the single best-performing model. The objective of this study is twofold: (1) to analyze the complementary predictive strengths in different prediction tools, and (2) to introduce a generic computational model to exploit the synergy among various prediction tools. Our primary goal is not to develop any particular classifier for B-cell epitope prediction, but to advocate the feasibility of meta learning to epitope prediction. With the flexibility of meta learning, the researcher can construct various meta classification hierarchies that are applicable to epitope prediction in different protein domains. RESULTS: We developed the hierarchical meta-learning architectures based on stacked and cascade generalizations. The bottom level of the hierarchy consisted of four conformational and four linear epitope prediction tools that served as the base learners. To perform consistent and unbiased comparisons, we tested the meta-learning method on an independent set of antigen proteins that were not used previously to train the base epitope prediction tools. In addition, we conducted correlation and ablation studies of the base learners in the meta-learning model. Low correlation among the predictions of the base learners suggested that the eight base learners had complementary predictive capabilities. The ablation analysis indicated that the eight base learners differentially interacted and contributed to the final meta model. The results of the independent test demonstrated that the meta-learning approach markedly outperformed the single best-performing epitope predictor. CONCLUSIONS: Computational B-cell epitope prediction tools exhibit several differences that affect their performances when predicting epitopic regions in protein antigens. The proposed meta-learning approach for epitope prediction combines multiple prediction tools by integrating their complementary predictive strengths. Our experimental results demonstrate the superior performance of the combined approach in comparison with single epitope predictors.


Assuntos
Epitopos de Linfócito B/química , Algoritmos , Inteligência Artificial , Linfócitos B/imunologia , Simulação por Computador , Epitopos de Linfócito B/imunologia , Modelos Biológicos , Conformação Molecular
8.
PLoS One ; 9(1): e84638, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24416256

RESUMO

An increase in the availability of data on the influenza A viruses (IAV) has enabled the identification of the potential determinants of IAV host specificity using computational approaches. In this study, we proposed an alternative approach, based on the adjusted Rand index (ARI), for the evaluation of genomic signatures of IAVs and their ability to distinguish hosts they infected. Our experiments showed that the host-specific signatures identified using the ARI were more characteristic of their hosts than those identified using previous measures. Our results provided updates on the host-specific genomic signatures in the internal proteins of the IAV based on the sequence data as of February 2013 in the National Center for Biotechnology Information (NCBI). Unlike other approaches for signature recognition, our approach considered not only the ability of signatures to distinguish hosts (according to the ARI), but also the chronological relationships among proteins. We identified novel signatures that could be mapped to known functional domains, and introduced a chronological analysis to investigate the changes in host-specific genomic signatures over time. Our chronological analytical approach provided results on the adaptive variability of signatures, which correlated with previous studies' findings, and indicated prospective adaptation trends that warrant further investigation.


Assuntos
Genômica , Vírus da Influenza A/genética , Animais , Bases de Dados Genéticas , Genoma Viral/genética , Especificidade de Hospedeiro , Humanos , Vírus da Influenza A/fisiologia , Fatores de Tempo
9.
BMC Med Inform Decis Mak ; 12: 131, 2012 Nov 14.
Artigo em Inglês | MEDLINE | ID: mdl-23148492

RESUMO

BACKGROUND: Appropriate postoperative pain management contributes to earlier mobilization, shorter hospitalization, and reduced cost. The under treatment of pain may impede short-term recovery and have a detrimental long-term effect on health. This study focuses on Patient Controlled Analgesia (PCA), which is a delivery system for pain medication. This study proposes and demonstrates how to use machine learning and data mining techniques to predict analgesic requirements and PCA readjustment. METHODS: The sample in this study included 1099 patients. Every patient was described by 280 attributes, including the class attribute. In addition to commonly studied demographic and physiological factors, this study emphasizes attributes related to PCA. We used decision tree-based learning algorithms to predict analgesic consumption and PCA control readjustment based on the first few hours of PCA medications. We also developed a nearest neighbor-based data cleaning method to alleviate the class-imbalance problem in PCA setting readjustment prediction. RESULTS: The prediction accuracies of total analgesic consumption (continuous dose and PCA dose) and PCA analgesic requirement (PCA dose only) by an ensemble of decision trees were 80.9% and 73.1%, respectively. Decision tree-based learning outperformed Artificial Neural Network, Support Vector Machine, Random Forest, Rotation Forest, and Naïve Bayesian classifiers in analgesic consumption prediction. The proposed data cleaning method improved the performance of every learning method in this study of PCA setting readjustment prediction. Comparative analysis identified the informative attributes from the data mining models and compared them with the correlates of analgesic requirement reported in previous works. CONCLUSION: This study presents a real-world application of data mining to anesthesiology. Unlike previous research, this study considers a wider variety of predictive factors, including PCA demands over time. We analyzed PCA patient data and conducted several experiments to evaluate the potential of applying machine-learning algorithms to assist anesthesiologists in PCA administration. Results demonstrate the feasibility of the proposed ensemble approach to postoperative pain management.


Assuntos
Analgesia Controlada pelo Paciente , Inteligência Artificial , Árvores de Decisões , Esquema de Medicação , Dor Pós-Operatória/tratamento farmacológico , Fatores Etários , Algoritmos , Analgesia Controlada pelo Paciente/classificação , Análise de Variância , Pressão Sanguínea/fisiologia , Chicago , Feminino , Frequência Cardíaca/fisiologia , Humanos , Masculino , Redes Neurais de Computação , Avaliação de Processos e Resultados em Cuidados de Saúde/normas , Manejo da Dor/instrumentação , Valor Preditivo dos Testes , Estudos Retrospectivos , Fatores de Risco , Fatores Socioeconômicos
10.
Comput Biol Med ; 42(10): 1005-11, 2012 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-22959278

RESUMO

Unlike previous research on patient controlled analgesia, this study explores patient demand behavior over time. We apply clustering methods to disclose demand patterns among patients over the first 24h of analgesic medication after surgery. We consider demographic, biomedical, and surgery-related data in statistical analyses to determine predictors for patient demand behavior, and use stepwise regression and Bayes risk analysis to evaluate the influence of demand pattern on analgesic requirements. We identify three demand patterns from 1655 patient controlled analgesia request log files. Statistical tests show correlations of gender (p=.0022), diastolic blood pressure (p=.025), surgery type (p=.0028), and surgical duration (p<.0095) with demand patterns. Stepwise regression and Bayes risk analysis show demand pattern plays the most important role in analgesic consumption prediction (p=0.E+0). This study suggests analgesia request patterns over time exist among patients, and clustering can disclose demand behavioral patterns.


Assuntos
Analgesia Controlada pelo Paciente/estatística & dados numéricos , Medição da Dor/métodos , Dor/tratamento farmacológico , Reconhecimento Automatizado de Padrão/métodos , Adulto , Idoso , Análise por Conglomerados , Feminino , Humanos , Masculino , Pessoa de Meia-Idade
11.
Comput Biol Med ; 42(1): 93-105, 2012 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22099701

RESUMO

This study proposes a general framework for structural motif discovery. The framework is based on a modular design in which the system components can be modified or replaced independently to increase its applicability to various studies. It is a two-stage approach that first converts protein 3D structures into structural alphabet sequences, and then applies a sequence motif-finding tool to these sequences to detect conserved motifs. We named the structural motif database we built the SA-Motifbase, which provides the structural information conserved at different hierarchical levels in SCOP. For each motif, SA-Motifbase presents its 3D view; alphabet letter preference; alphabet letter frequency distribution; and the significance. SA-Motifbase is available at http://bioinfo.cis.nctu.edu.tw/samotifbase/.


Assuntos
Biologia Computacional/métodos , Sistemas de Gerenciamento de Base de Dados , Bases de Dados de Proteínas , Proteínas/química , Análise de Sequência de Proteína/métodos , Motivos de Aminoácidos , Sequência de Aminoácidos , Dados de Sequência Molecular , Software
12.
Adv Exp Med Biol ; 680: 117-23, 2010.
Artigo em Inglês | MEDLINE | ID: mdl-20865493

RESUMO

Although the increasing number of available 3D proteins structures has made a wide variety of computational protein structure research possible, yet the success is still hindered by the high 3D computational complexity. Based on 3D information, several 1D protein structural alphabets have been developed, which can not only describe the global folding structure of a protein as a 1D sequence, but can also characterize local structures in proteins. Instead of applying computationally intensive 3D structure alignment tools, we introduce an approach that combines standard 1D motif detection methods with structural alphabets to discover locally conserved protein motifs. These 1D structural motifs can characterize protein groups at different levels, e.g., families, super families, and folds in SCOP, as group features.


Assuntos
Algoritmos , Bases de Dados de Proteínas/estatística & dados numéricos , Proteínas/química , Motivos de Aminoácidos , Sequência de Aminoácidos , Biologia Computacional , Modelos Moleculares , Estrutura Secundária de Proteína , Proteínas/genética , Alinhamento de Sequência/estatística & dados numéricos , Análise de Sequência de Proteína/estatística & dados numéricos , Design de Software
13.
BMC Bioinformatics ; 11: 411, 2010 Aug 03.
Artigo em Inglês | MEDLINE | ID: mdl-20682075

RESUMO

BACKGROUND: At present, the organization of system modules is typically limited to either a multilevel hierarchy that describes the "vertical" relationships between modules at different levels (e.g., module A at level two is included in module B at level one), or a single-level graph that represents the "horizontal" relationships among modules (e.g., genetic interactions between module A and module B). Both types of organizations fail to provide a broader and deeper view of the complex systems that arise from an integration of vertical and horizontal relationships. RESULTS: We propose a complex network analysis tool, Pyramabs, which was developed to integrate vertical and horizontal relationships and extract information at various granularities to create a pyramid from a complex system of interacting objects. The pyramid depicts the nested structure implied in a complex system, and shows the vertical relationships between abstract networks at different levels. In addition, at each level the abstract network of modules, which are connected by weighted links, represents the modules' horizontal relationships. We first tested Pyramabs on hierarchical random networks to verify its ability to find the module organization pre-embedded in the networks. We later tested it on a protein-protein interaction (PPI) network and a metabolic network. According to Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG), the vertical relationships identified from the PPI and metabolic pathways correctly characterized the inclusion (i.e., part-of) relationship, and the horizontal relationships provided a good indication of the functional closeness between modules. Our experiments with Pyramabs demonstrated its ability to perform knowledge mining in complex systems. CONCLUSIONS: Networks are a flexible and convenient method of representing interactions in a complex system, and an increasing amount of information in real-world situations is described by complex networks. We considered the analysis of a complex network as an iterative process for extracting meaningful information at multiple granularities from a system of interacting objects. The quality of the interpretation of the networks depends on the completeness and expressiveness of the extracted knowledge representations. Pyramabs was designed to interpret a complex network through a disclosure of a pyramid of abstractions. The abstraction pyramid is a new knowledge representation that combines vertical and horizontal viewpoints at different degrees of abstraction. Interpretations in this form are more accurate and more meaningful than multilevel dendrograms or single-level graphs. Pyramabs can be accessed at http://140.113.166.165/pyramabs.php/.


Assuntos
Armazenamento e Recuperação da Informação/métodos , Redes e Vias Metabólicas , Algoritmos , Análise por Conglomerados , Escherichia coli/metabolismo , Proteínas/metabolismo , Saccharomyces cerevisiae/metabolismo
14.
BMC Bioinformatics ; 9: 349, 2008 Aug 22.
Artigo em Inglês | MEDLINE | ID: mdl-18721472

RESUMO

BACKGROUND: Structural similarities among proteins can provide valuable insight into their functional mechanisms and relationships. As the number of available three-dimensional (3D) protein structures increases, a greater variety of studies can be conducted with increasing efficiency, among which is the design of protein structural alphabets. Structural alphabets allow us to characterize local structures of proteins and describe the global folding structure of a protein using a one-dimensional (1D) sequence. Thus, 1D sequences can be used to identify structural similarities among proteins using standard sequence alignment tools such as BLAST or FASTA. RESULTS: We used self-organizing maps in combination with a minimum spanning tree algorithm to determine the optimum size of a structural alphabet and applied the k-means algorithm to group protein fragnts into clusters. The centroids of these clusters defined the structural alphabet. We also developed a flexible matrix training system to build a substitution matrix (TRISUM-169) for our alphabet. Based on FASTA and using TRISUM-169 as the substitution matrix, we developed the SA-FAST alignment tool. We compared the performance of SA-FAST with that of various search tools in database-scale search tasks and found that SA-FAST was highly competitive in all tests conducted. Further, we evaluated the performance of our structural alphabet in recognizing specific structural domains of EGF and EGF-like proteins. Our method successfully recovered more EGF sub-domains using our structural alphabet than when using other structural alphabets. SA-FAST can be found at http://140.113.166.178/safast/. CONCLUSION: The goal of this project was two-fold. First, we wanted to introduce a modular design pipeline to those who have been working with structural alphabets. Secondly, we wanted to open the door to researchers who have done substantial work in biological sequences but have yet to enter the field of protein structure research. Our experiments showed that by transforming the structural representations from 3D to 1D, several 1D-based tools can be applied to structural analysis, including similarity searches and structural motif finding.


Assuntos
Bases de Dados de Proteínas , Alinhamento de Sequência/métodos , Algoritmos , Análise por Conglomerados , Métodos , Proteínas/química
15.
BMC Bioinformatics ; 9 Suppl 6: S3, 2008 May 28.
Artigo em Inglês | MEDLINE | ID: mdl-18541056

RESUMO

BACKGROUND: DNA-binding proteins are of utmost importance to gene regulation. The identification of DNA-binding domains is useful for understanding the regulation mechanisms of DNA-binding proteins. In this study, we proposed a method to determine whether a domain or a protein can has DNA binding capability by considering evolutionary conservation of DNA-binding residues. RESULTS: Our method achieves high precision and recall for 66 families of DNA-binding domains, with a false positive rate less than 5% for 250 non-DNA-binding proteins. In addition, experimental results show that our method is able to identify the different DNA-binding behaviors of proteins in the same SCOP family based on the use of evolutionary conservation of DNA-contact residues. CONCLUSION: This study shows the conservation of DNA-contact residues in DNA-binding domains. We conclude that the members in the same subfamily bind DNA specifically and the members in different subfamilies often recognize different DNA targets. Additionally, we observe the co-evolution of DNA-contact residues and interacting DNA base-pairs.


Assuntos
Sequência Conservada/genética , Proteínas de Ligação a DNA/química , Proteínas de Ligação a DNA/genética , DNA/química , DNA/genética , Evolução Molecular , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Sequência de Bases , Sítios de Ligação , Dados de Sequência Molecular , Ligação Proteica , Análise de Sequência de DNA/métodos
16.
Comput Methods Programs Biomed ; 78(3): 209-22, 2005 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-15899306

RESUMO

Making accurate functional predictions plays an important role in the era of proteomics. Reliable functional information can be extracted from orthologs in other species when annotating an unknown gene. Here a site-based approach called PORFIS is proposed to predict orthologous relationship. When applied to the bacterial transcription factor PurR/LacI family and the protein kinase AGC family, our method was able to identify, with few false positives, the important sites that agree with those verified by biological experiments. We also tested it on the alpha-proteasome family, the glycoprotein hormone family and the growth hormone family to demonstrate its ability to predict orthologous relationship. Compared with other prediction methods based on phylogenetic analysis or hidden Markov models, PORFIS not only has competitive prediction accuracy, but also provides valuable biological information of functionally important sites associated with orthologs which can be further studied in biological experiments.


Assuntos
Proteômica , Análise de Sequência de Proteína , Humanos , Conformação Proteica , Design de Software , Taiwan
17.
Nucleic Acids Res ; 31(13): 3446-9, 2003 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-12824343

RESUMO

RNA molecules play an important role in many biological activities. Knowing its secondary structure can help us better understand the molecule's ability to function. The methods for RNA structure determination have traditionally been implemented through biochemical, biophysical and phylogenetic analyses. As the advance of computer technology, an increasing number of computational approaches have recently been developed. They have different goals and apply various algorithms. For example, some focus on secondary structure prediction for a single sequence; some aim at finding a global alignment of multiple sequences. Some predict the structure based on free energy minimization; some make comparative sequence analyses to determine the structure. In this paper, we describe how to correctly use GPRM, a genetic programming approach to finding common secondary structure elements in a set of unaligned coregulated or homologous RNA sequences. GPRM can be accessed at http://bioinfo.cis.nctu.edu.tw/service/gprm/.


Assuntos
RNA/química , Análise de Sequência de RNA/métodos , Software , Internet , Mutação , Conformação de Ácido Nucleico , RNA/genética , Processos Estocásticos
18.
Comput Methods Programs Biomed ; 70(1): 11-20, 2003 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-12468123

RESUMO

Biologists have determined that the control and regulation of gene expression is primarily determined by relatively short sequences in the region surrounding a gene. These sequences vary in length, position, redundancy, orientation, and bases. Finding these short sequences is a fundamental problem in molecular biology with important applications. Though there exist many different approaches to signal (i.e. short sequence) finding, some new study shows that this problem still leaves plenty of room for improvement. In 2000, Pevzner and Sze proposed the Challenge Problem of motif detection. They reported that most current motif finding algorithms are incapable of detecting the target motifs in their Challenge Problem. In this paper, we show that using an iterative-restart design, our new algorithm can correctly find the target motifs. Furthermore, taking into account the fact that some transcription factors form a dimer or even more complex structures, and transcription process can sometimes involve multiple factors with variable spacers in between, we extend the original problem to an even more challenging one by addressing the issue of combinatorial signals with gaps of variable lengths. To demonstrate the effectiveness of our algorithm, we tested it on a series of the new challenge problem as well as real regulons, and compared it with some current representative motif-finding algorithms.


Assuntos
DNA/química , Análise de Sequência de DNA , Algoritmos , Regulação da Expressão Gênica
19.
Nucleic Acids Res ; 30(17): 3886-93, 2002 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-12202774

RESUMO

Given a set of homologous or functionally related RNA sequences, the consensus motifs may represent the binding sites of RNA regulatory proteins. Unlike DNA motifs, RNA motifs are more conserved in structures than in sequences. Knowing the structural motifs can help us gain a deeper insight of the regulation activities. There have been various studies of RNA secondary structure prediction, but most of them are not focused on finding motifs from sets of functionally related sequences. Although recent research shows some new approaches to RNA motif finding, they are limited to finding relatively simple structures, e.g. stem-loops. In this paper, we propose a novel genetic programming approach to RNA secondary structure prediction. It is capable of finding more complex structures than stem-loops. To demonstrate the performance of our new approach as well as to keep the consistency of our comparative study, we first tested it on the same data sets previously used to verify the current prediction systems. To show the flexibility of our new approach, we also tested it on a data set that contains pseudoknot motifs which most current systems cannot identify. A web-based user interface of the prediction system is set up at http://bioinfo. cis.nctu.edu.tw/service/gprm/.


Assuntos
Conformação de Ácido Nucleico , RNA/química , Software , Algoritmos , Sequência de Bases , Biologia Computacional/métodos , Mutação , RNA/genética , Reprodutibilidade dos Testes , Homologia de Sequência do Ácido Nucleico
20.
Bioinformatics ; 18(8): 1145-6, 2002 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-12176843

RESUMO

UNLABELLED: Most of the current bioinformatics literature lacks an explicit and clear description of the data used in experiments when introducing or evaluating computational tools. Without the exact data set that was fed into computational tools in experiments, any mistaken preparation of the data for later experiments may lead to discrepancy in conclusion. The NCTU BioInfo Archive is a new web-based bioinformatic data archive. It serves as a test bed for evaluating computational tools, as a bridge to link other research communities with bioinformatics, and also as an environment full of motivations and possibilities to encourage more exploratory research in bioinformatics. AVAILABILITY: http://bioinfo.cis.nctu.edu.tw


Assuntos
Metodologias Computacionais , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Genéticas , Armazenamento e Recuperação da Informação/métodos , Análise de Sequência/métodos , Biologia Computacional , Bases de Dados Factuais , Internet
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...